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A  Convergence  Theory  for  the  Structured 
BFGS  Secant  Method  with  an  Application 
to  Nonlinear  Least  Squares1,2,3 

J.E.  Dennis  .Jr.4,  H.J.  Martinez5  and  R.A.  Tapia4 


Abstract.  In  1981,  Dennis  and  Walker  developed  a  convergence  theory  for 
structured  secant  methods  which  included  the  PSB  and  the  DFP  secant 
methods,  but  not  the  straightforward  structured  version  of  the  BFGS  secant 
method.  Here  we  fill  this  gap  in  the  theory  by  establishing  a  convergence 
theory  for  the  structured  BFGS  secant  method.  A  direct  application  of  our 
new  theory  gives  the  first  proof  of  local  and  g-superlinear  convergence  of  the 
important  structured  BFGS  secant  method  for  the  nonlinear  least-squares 
problem  which  is  used  by  Dennis,  Gay  and  Welsh  in  the  current  version  of 
the  popular  and  successful  NL2SOL  code. 
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1.  Introduction. 

Historically,  by  a  secant  method  for  the  unconstrained  optimization  problem 

minimize  f  (x)  (1) 

where  /  :  IR”  — ►  1R,  we  mean  the  iterative  procedure 

X+  =  X  +s 

(2) 

B+  =B(ar,  s,  y,B) 

where  s  and  y  are  defined  by 


Bs  =  -V/(x) 

(3) 

V  =  V/(x+)-V/(x) 

(4) 

and  the  update  B+  must  satisfy  the  secant  equation 

B+s  =  y  .  (5) 

We  interpret  B+  as  an  approximation  to  V2/(a;+)  and  y  as  an 
approximation  to  V2/(x+)s.  Most  interesting  secant  updates  can  be  written  in 
the  form 


B+  =  B  +  A(s ,  y ,  B ,  v) 


where 


A  (s,y,B,v) 


{y  —  Bs)  vT  +v(y  - Bs)T  _  (y  -Bs)Ts  T 
vTs  (vTs  )2 


(6) 

(7) 


for  some  choice  of  the  vector  v.  Following  Dennis  and  Walker  [Ref.  l]  we  call  v 
the  scale  of  the  particular  secant  update  in  question.  The  scale  v  will  usually 
depend  on  s,  y  or  B  as  is  the  case  for  the  following  Avell-known  updates: 


PSB 


v  =  s 


(8) 
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DFP  v  =  y  (9) 

y! 

BFGS  v  -  y  +  SIS.  2  Bs  .  (W) 

In  what  follows,  we  will  write  v(s,y,B )  when  the  use  of  v  alone  may  cause 
confusion. 

Often,  in  practice,  a  part  of  V2/  (a:)  is  available  and  we  need  only  to 
approximate  the  remaining  part.  Suppose  that 

V2/  (x)  —  S(x)  +  C(x) 

where  C:IR”  — >-IR”Xn,  the  available  part  of  V2/ (a:),  is  symmetric.  In  several 
important  applications,  e.g.  nonlinear  least-squares,  C(x)  is  composed  of  the 
first-order  information  and  S(x)  requires  second-order  information. 

By  a  structured  approximation  of  V2/  (x)  we  mean  an  approximation  of  the 

form 

B  =  A  +C{x) 

where  A  is  an  approximation  to  S(x).  Moreover,  if  B  is  updated  according  to 
the  formula 

B+  =A+  +  C(x+) 

where 

.4+  =  A  +  A(s,  y*  ,  A  ,  v)  (11) 

and  y#  is  an  approximation  to  S(x+)s,  then  we  call  B+  a 
structured  A  approximation  of  V2/(x+).  Observe  that  the  update  (11)  satisfies 
the  secant  equation 


A+s  =  y* 


(12) 
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We  obtain  a  structured  A  method  for  problem  (1)  if  in  (2)  we  use 
B+  =  A  +  +  C(x+)  where  A+  is  given  by  (11). 

Historically,  the  major  issues  one  faces  in  a  particular  application  of 
structure  is  the  choice  of  y #  and  the  choice  of  scale  v  in  (11).  In  an  effort  to 
give  a  choice  for  y& ,  which  could  be  used  when  the  structure  did  not  suggest  a 
better  choice,  Dennis  and  Walker  [Ref.  l]  proposed  the  default  choice 

V*  =  y  —  C(a:+)s 

where  y  is  given  by  (4).  The  rationale  for  the  default  choice  is  quite 
straightforward.  This  choice  of  leads  to  an  update  B+  =  A+  +  C(x+)s  which 
satisfies  the  standard  (unstructured)  form  of  the  secant  equation.  To  see  this 
observe  that  if  ,4+s  =  y  —C(x+)s,  then  B+s  =  y. 

The  primary  criticism  of  the  default  choice  is  that  it  does  not  take  full 
advantage  of  structure;  the  quantity  y  =  V/  (ar+)  —  V/  (x)  does  not  exploit 
structure.  It  has  been  our  experience  that  each  application  of  structure  suggests  a 
choice  for  y#  which  takes  advantage  of  structure  and  is  superior  to  the  default 
choice. 

While  the  ambiguity  in  the  choice  for  y &  has  not  created  serious  problems 
in  the  application  of  structured  secant  methods,  the  ambiguity  in  the  choice  of 
scale  has  been  the  major  detriment  to  development  of  successful  structured  BFGS 
secant  methods  and  a  general  convergence  theory  for  such  methods.  For  this 
reason,  we  first  present  a  fairly  complete  historical  development  of  the  choice  of 
scale  for  structured  secant  methods  and  then  present  a  general  rule  for  choosing 
the  scale  in  structured  secant  updates. 

The  primary  application  for  structured  secant  methods  has  been  the 
nonlinear  least-squares  problem  (see  Section  4).  Work  in  this  area  includes  Brown 
and  Dennis  [Ref.  2],  Dennis  [Refs.  3,  4,  5] ,  Betts  [Ref.  6],  Bartholomew-Biggs 
[Ref.  7],  Dennis  and  Welsch  [Ref.  8],  Dennis,  Gay  and  Welsch  [Ref.  9,  10],  Dennis 
and  Walker  [Ref.  l],  Dennis  and  Schnabel  [Ref.  11],  Al-Baali  and  Fletcher  [Ref. 
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12] .  Xu  [Ref.  13],  Mahdavi  and  Bartels  [Ref.  14]  and  Toint  [Ref.  15].  Several  of 
these  works  considered  the  structured  PSB  update.  For  the  PSB  update  the  issue 
of  the  proper  choice  of  scale  does  not  arise,  since  the  scale  is  v  —  s. 

In  all  these  works,  the  general  problem  of  how  should  the  scale  be  modified 
when  one  decides  to  utilize  structure  in  the  secant  update  is  not  considered. 
Indeed,  it  is  interesting  that  in  these  works  only  Al-Baali  and  Fletcher  [Ref.  12] 
actually  carried  the  structure  into  the  choice  of  scale.  However,  they  gave  no 
convergence  analysis  for  their  structured  BFGS  secant  method.  The  only  works 
that  contain  a  convergence  analysis  are  Dennis  and  Walker  [Ref.  1]  and  Xu  [Ref. 

13] .  Dennis  and  Walker,  as  an  application  of  their  general  theory,  established 
local  and  superlinear  convergence  for  the  structured  PSB  and  DFP  methods  in 
general  and  for  the  nonlinear  least-squares  problem  in  particular.  Their  theory 
does  not  include  structured  BFGS  secant  methods.  On  the  other  hand,  the  BFGS 
secant  method  for  the  nonlinear  least-squares  problem  studied  by  Xu  in  Ref.  13 
(see  Ref.  12)  is  only  mildly  structured  in  that  it  utilizes  structure  in  the  choice  of 
y  as  an  approximation  to  V'2/  (z+)s,  but  not  in  B+.  As  such,  <7 -superlinear 
convergence  follows  from  the  standard  theory  by  viewing  their  choice  for  y  as  a 
perturbation  of  the  standard  unstructured  choice  for  y . 

The  nonlinear  least-squares  problem  is  an  important  problem  and  the  use  of 
structure  is  a  significant  part  of  the  formulation  of  any  secant  method  for  this 
problem.  These  two  facts  have  been  reinforced  by  the  popularity  and  success  of 
the  NL2SOL  code  of  Dennis,  Gay  and  Welsh  [Ref.  9,  10].  This  code  originally 
used  the  structured  DFP  secant  update  analyzed  by  Dennis  and  Walker  in  Ref.  1, 
but  now  uses  the  structured  BFGS  secant  update  suggested  by  Al-Baali  and 
Fletcher  in  Ref.  12.  While  the  authors  report  improved  numerical  results,  there  is 
no  local  convergence  theory  for  the  new  version  of  the  algorithm.  This  lack  of 
theory  played  a  major  role  in  motivating  the  present  work. 
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Another  important  application  of  structured  secant  methods  was  given 
recently  by  Tapia  [Ref.  17].  He  extended  the  class  of  secant  updates  given  by  (6)- 
(7)  to  updates  for  equality  constrained  optimization  which  utilize  the  structure 
present  in  the  Hessian  of  the  augmented  Lagrangian.  Local  and  <7-superlinear 
convergence  for  the  DFP  and  BFGS  versions  of  these  structured  secant  methods 
was  established  under  standard  assumptions. 

A  close  look  at  the  ingredients  in  Tapia’s  theory  [Ref.  17]  reveals  a  structure 
principle  which  we  can  extract  and  use  to  formulate  a  general  rule  for  defining 
the  scale  in  any  structured  secant  update.  This  structure  principle  also  provides 
an  insightful  way  of  viewing  the  structured  secant  approximation  when 
formulating  our  convergence  theory. 

Structure  Principle.  Assume  that  V2/  (x)  =  S(x)  +  C(x).  Given 

B  —  A  +  C{x) 

as  an  approximation  to  V  '2/  (x)  we  want 

B+  —  A+  +  C(x+) 

as  an  approximation  to  V2/  (x+),  where  x+  =  x  +  s. 

Compute  B+  as  an  update  of  A  +  C(x).  Toward  this  end  consider 

yS  =  y*  +  C(X  +  )S  , 
as  an  approximation  to  V2/  (x  +  )s  and  let 

Bs  =  A  +  C{x+). 

The  secant  update  of  B s  is 

B+=  Bs  +  A(s ,  ys ,  Bs ,  v{s ,  ys ,  Bs )) . 


Now,  observe  that  for  any  v 
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A(s,  ys ,BS ,  v)  =  A(s,  y*  ,A  ,  v) 

so  that  we  can  write 

B  +  =  A  +  C(x+)  4-  A(s,  y*  ,  A,  v(s,  ys,  Bs)) . 

It  now  seems  reasonable  to  define 

A+=A  +  A(s,  y*  ,.4  ,  v(s,  ys ,  Bs)) .  (13) 

and  call  it  the  structured  secant  update  of  A . 

Remark  1.1.  Clearly  A+  given  by  (13)  satisfies  the  secant  equation  (12). 

Remark  1.2.  In  essence  the  structure  principle  is  saying  that  the  scale 
should  take  structure  and  the  complete  problem  into  account. 

Remark  1.3  For  the  PSB  update  the  structure  principle  leaves  the  scale 
unchanged. 

Remark  1.4.  In  his  application  [Ref.  17],  Tapia  calls  the  update  A+  which 
results  from  the  structure  principle  the  augmented  scale  secant  update. 

Remark  1.5.  In  the  remainder  of  this  paper,  when  we  refer  to  the  structured 
BFGS  secant  update,  we  will  assume  that  in  (7)  the  scale  is  v(s,ys,Bs )  where 
v(s,y,B)  is  given  by  (10). 

Remark  1.6.  For  the  nonlinear  least-squares  problem  our  structured  BFGS 
secant  update  is  the  same  as  that  suggested  by  Al-Baali  and  Fletcher  in  Ref.  12. 

In  our  analysis,  we  will  use  several  different  matrix  norms.  The  Frobenius 
norm  will  be  denoted  by  II  \\F  ,  the  Frobenius  norm  weighted  by  V'2/  (2:*)  will 
be  denoted  by  ||  ||*,  i.e.  ||  ||*  =  ||V2/  (z*)_1//2(  ■  )V2/  (x*)_1//2 \\F  and  the 

l o-operator  norm  will  be  denoted  by  ||  ||.  The  only  vector  norm  that  will  be 

used  is  the  Euclidean  norm,  and  it  will  be  denoted  by  ||  •  ||. 

The  standard  assumptions  for  problem  (1)  are: 
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Al:  Problem  (1)  has  a  local  solution  x * 

A2:  The  function  /  €  C2,  and  V2/  and  C  are  locally  Lipschitz  continuous  at 
x *,  i.e.,  there  exist  constants  L  >  0,  Lq  >  0  and  G  >  0  such  that 


||V2/  (x)  —  V2/  (x*)  ||  <L  \\x-x*  II 

(14) 

and 

||C(x)-C'(x*)  ||  <  Lc  \\x-x*  || 

(15) 

for  x  £Dy  =  (x:  ||x  —  x*  ||  <  o}. 

A3:  The  matrix  V2/(x,)  is  positive  definite,  i.e.,  there  exist  positive  constants 
m  and  M  such  that 

m  \\z  ||2  <zTXr2f{x*)z  <M \\z  ||2  for  all  z£  lRft  •  (16) 

In  this  paper  we  will  consider  only  the  structured  BFGS  secant  method.  In 
Section  2  we  prove  that  the  structured  BFGS  approximations  to  the  Hessian 
satisfy  a  surprising  and  strong  form  of  bounded  deterioration.  In  Section  3  we 
establish  local  g-superlinear  convergence  for  the  structured  BFGS  secant  method 
using  the  Broyden,  Dennis  and  More,  Dennis  and  More  and  Griewank  and  Toint 
theories  [Refs.  18,  19,  20].  Finally,  in  Section  4  we  use  this  theory  to  prove  the 
local  9-superlinear  convergence  of  the  structured  BFGS  secant  method  used  in 
the  current  version  of  the  popular  NL2SOL  code  for  nonlinear  least-squares 

problems. 


2.  Bounded  Deterioration  for  the  Structured  BFGS  Update. 

Our  objective  in  this  section  is  to  demonstrate  that  the  structured  BFGS 
approximations  to  the  Hessian  satisfy  the  bounded  deterioration  principle  given 
by  Dennis  in  Ref.  21  and  popularized  by  Broyden,  Dennis  and  More  in  Ref.  18. 
Moreover,  we  will  prove  that  the  BFGS  secant  updates  satisfy  a  surprising  and 
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stronger  form  of  this  principle.  Specifically  they  satisfy,  for  x  and  B  sufficiently 
close  to  x*  and  V2/  (x*)  respectively,  the  condition 

1|5+ -  V2/ (x^)  ||*  <  ||S  —  V2/  (x*)  ||*  +acr(x,  x+)  (17) 

where  (fiu ,  v)  =  max  {  \  \  u  —x*  ||,  ||t>  —  x*  ||},  and  a  is  a  nonegative  constant. 

This  fact  will  allow  us  to  use  the  Broyden-Dennis-More'  theory  to  establish 
that  under  the  standard  conditions  the  sequence  {x^.}  generated  by  a  structured 
BFGS  secant  method  is  locally  q -linearly  convergent  to  x*.  The  </-superlinear 
convergence  will  then  follow  from  Proposition  4  of  Griewank  and  Toint  [Ref.  20] 
and  the  Dennis-More'  characterization  [Ref.  19]. 

The  bounds  needed  to  prove  inequality  (17)  when  the  structure  in  the 
Hessian  is  not  used  follow  from  the  fact  that  y  is  a  good  approximation  to 
V2/(x*)s  and  Assumption  A3.  We  formalize  this  fact  in  the  following 
proposition. 

PROPOSITION  2.1.  Assume  that  Standard  Assumption  AS  holds  and  let  D  be 
a  neighborhood  of  x*.  For  xvx2£D  define  s—x2  —  xx  and  let  y  be  an 
approximation  to  V2/  (x*)s.  If  there  exists  Kx  >  0  such  that 

II V  -V2/  (x,)s  ||  <  Kla(xl,x2)  ||s  ||  (18) 

for  all  Xj,  Xo  ED  ,  then  the  following  inequalities  hold: 

1 1 2/  II  <  (A/  +/f1cr(x1,x2))  ||s  ||  (19a) 

yTs  <(M+K1a(x1,x2))\\s  ||2  (19b) 

where  M  is  given  in  Standard  Assumption  AS.  Moreover,  there  exist  positive 
co?istants  e9,  and  /3  such  that  the  following  inequalities  hold: 

vT *  >0  IP  IP 


(20a) 
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ll’lll's'l  <f  +  ^1,*2),  .*0 


T 

V  s 


(20b) 


for  xv  x2  €  D2  =  {x:  |  |a;  —  x*  1 1  <e2}  C D  . 


Proof.  Let  z  =y  —  V2/  (x*)s  and  xvx2ED.  Then  (19)  follows  directly 
from  inequality  (18)  and  Standard  Assumption  A3.  To  define  D2,  choose  e2  so 
that  Kx  e2  <  m  and  D2(ZD,  where  m  is  given  in  Standard  Assumption  A3.  If 
xx,  x2(zD2,  then  (20a)  follows  from  Standard  Assumption  A3  with  (3=m  —  Kxe2. 
Finally,  notice  that  for  s 

IHI1HI  _  1 1  y  II  ll«  II2  . 

yT  s  IIs  ll  y T  s 

so  that  (20b)  follows  from  inequalities  (19a)  and  (20a).  • 


Similarly,  when  the  structure  in  the  Hessian  is  used,  the  bounds  needed  to 
establish  bounded  deterioration  (17)  follow  from  Standard  Assumption  A2  and 
A3,  and  the  fact  that  y #  is  a  "good"  approximation  to  S(x*)s.  We  formulate 
this  fact  in  the  next  proposition. 


PROPOSITION  2.2.  Assume  that  Standard  Assumption  A2  holds  and  let  D  be 
a  neighborhood  of  x* .  For  xvx2ED  define  s=x2  —  xl  and  let  y&  be  an 
approximation  to  S(x*)s.  If  there  exists  K2  >  0  such  that 

II V*  -S{x,)s  ||  <K2ct(xvx2)\\s  ||  (21) 

for  all  xx,x 2ED,  then  there  exists  /v3  >  0  such  that  ys  =y&  +C(x)s  for  any 
x  E  [arj,  ar2]  satisfies 

l|js  —  V3/  (x,  )s  II  <K3o(x„x,)\\s  II  (22) 


for  all  Xi,  Xr,  £  D  !  D  D  where  D  j  is  given  in  Standard  Assumption  A2. 
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Proof.  Let  xv  z2  £  D  x  fl  D .  Taking  advantage  of  the  structure  in  ys  and  in 
the  Hessian,  we  can  write 

II yS  -V2/(z*)s  II  <  \\y*  -S(x*)s  II  +  ||[C(x)-(7(x*)]s  || 

<  K2a(x1,x2)  ||s  ||  +LC  ||T-x,  ||  ||s  || 

<{K2  +  Lc)(r(x1,x2)\\s  \\  .  • 


The  next  lemma  is  very  useful  when  dealing  with  weighted  Frobenius  norms. 
Particular  cases  of  it  were  established  by  Powell  and  by  Griewank  and  Toint 
[Ref.  22,  20]. 


LEMMA  2.3.  Consider  a  symmetric  matrix  B  EIR”Xn  and  vectors  u,  z  EIR”. 
Suppose  that 

u  T u  =  1  and  uT Bu  —  (uT z)~  .  (23) 

If  we  define 

B'  =  B  +uuT  -zzT  ,  (24) 


then 


l|B'-/|IV=  \\B  -I  \\2F  zf  +  2(zT Bz  -(zT zf)}  .  (25) 

Bv 


Moreover,  if  B  is  symmetric  and  positive  definite,  u  = 

for  some  vector  v  E 1R"  ,  v  0,  then 

\\B’-i  ||f  <  1 1 it  —  /  ||f 


—i — n-  and  z  =  — 7 - =— 

\v  II  \/vT  Bv 


(26) 


Proof.  The  first  part,  (25),  is  a  straightforward  application  of  IU  ll>  = 
trace  (AT  A),  trace  {A  +  B)  =  trace  (A)  +  trace  (B),  and  trace  (xyT">  =  xT  y  . 
Observe  that 
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(B  ~  I)T  {B  ~  -0  —  [B  —  I)T  (B  —  I)  +  {B  —  I)uuT  +  uuT  [B  —  I) 

—  (B—I)zzr—zzT(B—I)+(uTu)uuT 
+  (zT  z)  zzT  —(uT  z)  uzT  —{zTu)zuT 

and  so 

trace  (B'  -  I)T  (B'  - 1)  =  trace  (B  -  I)T  (B  - 1)  +2uT  (B  - 1)  u 

-2zt(B -I)z  +  (utu)2  +  (ztz)2-2(utz)2  . 


Finally,  we  obtain  (25)  using  (23). 


To  demonstrate  (26),  notice  that  the  given  w  and  z  satisfy  (23)  for  any 
vector  v  =^0.  Therefore,  (26)  will  be  true  if  zT Bz  —(zTzf> 0. 

Using  the  definition  of  2  we  have 


ztBz  ~{zTzf 


v  T B3v 
v  T  Bv 


v  T B2u 
vT  Bv 


vT B3v  ■  vT Bv  —{vtBv)2 
{v  T Bv  )2 


We  will  now  show  that  the  numerator  of  the  last  expression  is  positive.  From 
the  Cauchy-Schwarz  inequality  we  have 


’ B3v  ■  vtBv  =  || B^-v  |P \\Bl/2v  H'2  =  || BZ'-v  ||  ||Bl/2t;  || 


>  [( 


(. B3/2v)T{Bl/2v )  ’  = 


v1  B 


'  O 

‘0 

“V  • 


Now  we  establish  the  bounded  deterioration  principle  for  the  (unstructured) 
BFGS  secant  approximations.  The  proof  is  based  on  the  approach  used  by 
Griewank  and  Toint  [Ref.  20]  for  the  Broyden  convex  class  of  secant  updates. 
However,  our  result  is  stronger  than  the  specialization  to  the  BFGS  of  their  result 
(we  obtain  a  sharper  bounded  deterioration  inequality).  Moreover,  in  order  to 
fully  expose  the  ideas  involved,  we  will  not  assume  that  the  problem  has  been 
transformed  so  that  the  Hessian  at  x*  is  the  identity  matrix. 
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THEOREM  2.4.  Suppose  that  Standard  Assumption  AS  holds.  Let  B+  be  the 
(unstructured)  BFGS  secant  update,  i.e. 

B+  —  B  +  A(s,y,B,v)  (27) 

where  s  =x+—x,  the  scale  v  is  given  by  (10)  and  y  is  an  approximation  to 
V2/(x*)s.  If  y  satisfies  inequality  (18),  then  the  bounded  deterioration  inequality 

||i?+-V2/Mll*  <  \\B -V2/(a;*)  ||,  +alcr(x,x+)  (28) 

holds  whenever  x,  x+  ED2,  where  Do  is  given  in  Proposition  2.1. 


Proof.  Let  B*  =V2/  (x*)  and  x,x+E.D  2.  Recall  that  the  BFGS  secant 
correction,  (7)  with  (10),  can  also  be  written  as 


BFGS(s,  y  ,B) 


yy  T  _  Bss  T  B 
yT  s  sT  Bs 


(29) 


(see  Chapter  9  of  Dennis  and  Schnabel  [Ref.  11]).  Define 

B'  =  B  +  BFGS  (s,B* s,B)  . 


(30) 


The  idea  of  the  proof  is  to  determine  bounds  on  IK-bii,  and 
II B'-B'  ||.  in  terms  of  || B  —B*  ||#  and  then  apply  the  triangle  inequality  to 
obtain  (28).  Notice  below  that  the  strong  form  of  bounded  deterioration  given 
by  (28)  is  a  consequence  of  the  fact  that  the  difference  between  B+  and  B'  does 
not  depend  on  B . 


The  bound  on  || B'  —  B*  ||*  follows  from  (26)  in  Lemma  2.3.  If 
B'  —  B*~l/2B'B  *~1/2,  B  =  B  *~xi1BB  *-1/2  anc{  v  _  £  *i/2s  we  can  wrj^e 
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\\B'—B*  II*  =  \\B*-ll\B'-B*)B*-ll2\\F  =  \\B'  —  I  || 

B* ssT B*  Bss  T B 


=  ||£?*-1/2 


5  -5*  + 


r5s 


B*~1!2  || 


||B*l/'Js  II2  stBs 


-  ||B— /  + 


vv 


Bv(Bv)T  ||^ 


v  T  Bv 


Therefore,  by  (26) 

\\B--B"  II,  <  p-B*  ||, 

To  derive  a  bound  on  \\B+—B'  || ,,  observe  that 
yyT  B*  ssT  B* 


(31) 


B+.-B' 


T 

y  s 


TB 


y{y  -B  s ) 


*  *\T 


T 

y  s 


+ 


l 


y  s 


■tB% 


Tn*  ,  (y  —  B  s) sT B 
ys1  B  +  - - — — - 

s  T  B  s 


Using  A3,  (18),  (19),  (20)  and  \\xyT  ||^r  =  ||x  ||  \\y  ||,  we  have 
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115  —B'\\  <  Ik  11  lb  ~B*8  II  I  Ik  II  WB*S  II lb -B*s  11  11s  || 

II  “f"  *  » t  -  T  *  m  <n  + 


T  T  n  ^ 

y1ss1B  s 


s  i  B  s 


s  ||  1 1  y  —  B  *  s  1 1 


Therefore 


where 


ib  mini  iiiniiHi2  \\v-b*s\\ 


y 1  s  s1  B  s  lb  II 

\\y-B*s\\  limilHI2 

nm  stb*s 

M  +Kla(x,x+)  M  +Kla{x,x+)  M  M  „  ,  \ 

<  - - + - 3 - • —  +  —  Kxa(x,x+) 

P  Pmm 

M+Kxe2  M  +  Kxe2  M  M  jy.  ,  . 

<  - 3 - + - 3 - +  —  Kxa{x,x+  . 

p  Pmm, 


IK-fl'II.  <  ||s * 1/3 1|2 ||#+  —B' 
<  axa(x,  x+) , 


ai  = 


jb  M  +/C1e2  Af^  |  Af 
m  P  m  m 


Finally,  using  the  triangle  inequality,  (31),  and  (32),  we  have 

\\B+-B‘  ||,  <  ||B+-B'||,+  II B'-B"  ||. 

<  <*,ff(z,z+)  +  || B  -B'  || ,  , 


which  is  the  strong  form  of  bounded  deterioration  that  we  set  out  to  prove. 
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Finally,  we  prove  an  analogous  result  for  the  structured  BFGS  secant 
approximations. 


THEOREM  2.5.  Suppose  that  Standard  Assumptions  A2  and  AS  hold.  Let 
B+  he  the  structured  BFGS  secant  update,  i.e., 

B+=A+  +  C(x+ )  (34a) 

where 

A+  =  A  +  A(s ,  y*  ,  A  ,  v(s ,  ys ,  Bs )) ,  (34b) 

s  —x+—  x,  the  scale  v  is  given  by  (10),  and  ys  and  y&  are  approximations  to 
V2/  (x*)s  and  S(x*)s  respectively  such  that  ys  —y#  =  C(x)  for  any  x  E  [£,£+]• 
If  ytt  satisfies  inequality  (21),  then  there  exists  a  neighborhood  D3  of  x*  such  that 

||B+  —V2/  (x*)  ||*  <  \\B-V*f(x,)\\.+a2a(x,x+)  (35) 

holds  whenever  x ,  x+£D3  =  D  xnD  2,  where  Dx  and  D2  are  given  in  A2  and 
Proposition  2.1  respectively. 

Proof.  Let  B*  =V2/  (x*),  Bs  =A  +C(x)  and  restrict  D3  as  needed  so 
that  B s  is  positive  definite. 

Now,  using  (34)  and  the  following  simple  observation  (which  we  commented 
about  in  Section  1): 

A  (s,y#  ,  A, v)  =  A  (s,ys  ,BS  ,v) 


we  have  for  x,  x+  E  D  3 


17 


R+=-4  +  +  C(*+) 

=  A  +  A(s,  y#  ,A,  v)  +  C(x+) 

=  A  +  A(s,  ys ,  Bs ,  v)  +  C(x+)  (36) 

=  Bs  —  C(x)  +  A(s,ys,Bs,v)  +  C(x+) 

=  BS  +A (s,ys,Bs,v)+C{x+)-C{x)  . 

Since  Proposition  2.2  allows  us  to  use  Theorem  2.4,  and 
Bs  =  B  +  C(x)  —  C(x),  we  can  write 

II B+—B*  ||*  <  ||R5+A(s,y5,R5,v)-R*  ||*  +  1 1 C(x+)  —  C(x)  ||* 

<  ||B5  —  B*  II*  +a;1a-(a:,a:+)4- 

+  Vn"  Lc  ||fl*  ‘/2||2(  IU+-x*  ||+  \\x-xt  ||) 

<  ||fl  -B*  ||*  +  ||C(J)-C(a?)  ||*  +  cv1cr(x,x+)H - —  °  cr{x,  x+) 

it  *  1 1  4V  n  Lp 

<  || B-B*  ||*+K  + - £-]a(x,x+), 

7  TX 

A'/uLq 

which  is  (35)  with  a0  =  al  H - where  a,  is  given  in  Theorem  2.4.  • 

m 


3.  Local  Convergence  Theory. 

In  this  section  we  will  establish  the  local  and  </-superlinear  convergence  of 
the  structured  BFGS  secant  method  defined  in  Section  1.  Our  approach  will  be 
to  use  the  results  of  Section  2  and  the  Broyden-Dennis-More’  theory  to  prove  local 
(/-linear  convergence.  Then  we  use  (36),  Proposition  4  of  Griewank  and  Toint 
[Ref.  20]  and  the  Dennis-More’  characterization  [Ref.  19]  to  obtain  </-superlinear 
convergence.  For  completeness  we  restate  the  Griewank-Toint  proposition  as 
follows. 
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PROPOSITION  3.1  (Griewank  and  Toint  [Ref.  20]).  Suppose  that  Standard 
Assumptions  Al,  A2  and  A3  hold.  Let  {xk}  be  a  sequence  which  converges  to  x* 
and  satisfies 


£  \\Xk  -X*  II  <  OO  . 
k>  0 


(37) 


Also,  let  {Bk},  the  approximations  to  the  Hessian,  be  generated  by  (2)  and  (6)-( 7) 
starting  with  a  symmetric  positive  definite  matrix  B0.  Then 


lim 

k 


II (Bt -vy  (x,))St  || 

l»t  II 


=  0  . 


(38) 


The  next  theorem  gives  sufficient  conditions  to  insure  local  ^-superlinear 
convergence  for  the  structured  BFGS  secant  method. 

THEOREM  3.2.  Suppose  that  Standard  Assumptions  Al,  A2  and  AS  hold.  If 
s  =xx—x  2,  yS  and  y#  are  approximations  to  V2/  (x»)s  and  S(x*)s  respectively 
such  that  ys  —y&  =  C'(x)s  for  some  x  £  [x,x+],  and  y &  satisfies 

||y#  -S{x*)s  ||  <  K2a{xx,x2)  ||s  || 

for  xvx2  €.D  and  some  K2  >  0,  then  there  exist  positive  constants  e,  6  such  that 
for  x0£]Rn  and  symmetric  A0£lR”Xn  satisfying  ||x0  —  x*  ||  <  e  and 
mo-5(x*)  ||  <  5,  then  sequence  {xk}  generated  by  the  structured  BFGS  secant 
method  for  problem  (1)  is  q  -super linearly  convergent  to  x*. 

Proof.  As  was  the  case  in  Dennis  and  Walker  [Ref.  1]  the  local  (/-linear 
convergence  is  a  straightforward  application  of  bounded  deterioration  (Theorem 
2.5  in  this  case)  and  the  standard  Broyden-Dennis-More  theory.  Let 
B*  =  V2/  (x*)  and  A *  =S(x*). 
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Since  B  *  is  positive  definite,  there  exist  neighborhoods  Nx  of  x *  and  N2  of 
B*  which  are  sufficiently  small  so  that  NlGD3,  N2  contains  only  positive 
definite  matrices  and  x+ED3  for  every  (x,B)ENl  XN2.  Now,  choose  a 
neighborhood  N3  of  A*  and  restrict  Nl  as  needed  so  that  (x,A)£N  =  NlXN3 
implies  that  A  +  C(x)  £  N2. 

Theorem  2.5  allows  us  to  use  Theorem  3.2  of  Broyden,  Dennis  and  More' 
[Ref.  18]  to  prove  that  {rr^}  converges  g-linearly  to  x *.  Now  since  the  difference 
between  B+,  the  structured  BFGS  secant  update,  and  an  (unstructured)  BFGS 
secant  update  is  of  size  cr(x,x+)  (see  (36)),  we  can  use  Proposition  3.1  to  prove 
that  the  sequence  of  structured  BFGS  secant  updates  satisfies  Limit  (3.2). 
Finally,  from  Theorem  2.2  of  Dennis  and  More'  [Ref.  19]  we  conclude  that  the 
rate  of  convergence  is  </-superlinear.  • 


4.  Application  to  Nonlinear  Least  Squares. 

In  this  section  we  apply  the  result  of  Section  3  to  establish  the  local  and  q- 
superlinear  convergence  of  the  structured  BFGS  secant  method  for  the  nonlinear 
least-squares  problem  and  implemented  in  the  current  version  of  the  NL2SOL 
code  given  in  Refs.  9  and  10.  Our  presentation  of  the  nonlinear  least-squares 
problem  follows  Chapter  10  of  Dennis  and  Schnabel  [Ref.  11]. 

The  nonlinear  least-squares  problem  is 

1  T  1  m 

minimize  f  (a;)  =  —  R  {x)T R  (x)  =  —  rt(xf  (39) 

W  1=1 

where  m  >n,  the  residual  function  R:JRn  — ►]Rm  is  nonlinear  and  r^x)  denotes 
the  ith  component  function  of  R(x).  Straightforward  calculations  show  that  the 
gradient  of  /  is  given  by 
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V/  (a;)  =  J(x)tR  (x)  (40) 

where  J(x )  denotes  the  Jacobian  of  R  at  x,  and  the  Hessian  of  /  is  given  by 

V2/  (ar)  =  C(x)  +S(x)  (41) 

where 

C(x)  =  J(x)TJ(x) ,  (42a) 

m 

six)  =  £  ri(x)  v2r,-(x),  (42b) 

i'  =  i 

and  V2r,-(x)  is  the  Hessian  of  ri  at  x. 

As  we  mentioned  in  the  introduction,  the  use  of  structure  is  an  important 
part  of  the  formulation  of  any  secant  method  for  the  least-squares  problem  (39). 
Among  all  the  suggested  secant  formulations  for  this  problem,  one  of  the  most 
popular  and  successful  is  the  NL2SOL  code  of  Dennis,  Gay  and  Welsch  [Refs.  9, 
10].  The  choice  of  y#  used  in  this  code  is 

V #  =  /(x+)-/(x)]rR(a;+)  (43) 

which  was  given  by  Dennis  [Ref.  4]  and,  independently,  by  Bartholomew-Biggs 
[Ref.  7]. 

The  NL2SOL  code  originally  used  the  structured  DFP  secant  update  ((7)-(9)) 
suggested  by  Dennis  and  Welsh  [Ref.  8]  and  analyzed  by  Dennis  and  Walker 
[Ref.  1],  but  now  it  uses  the  structured  BFGS  secant  update  ((7)-(l0))  suggested 
by  Al-Baali  and  Fletcher  in  Ref.  12.  While  the  authors  report  improved 
numerical  results,  there  is  no  local  convergence  theory  for  the  new  version  of  the 
algorithm.  We  will  establish  such  theory  in  the  next  paragraphs. 

Consider  the  following  standard  assumptions  for  problem  (39). 

Al:  Problem  (39)  has  a  local  solution  x *. 
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A2:  The  function  f  EC2  and  J  and  V2/  are  locally  Lipschitz  continuous  at  x* , 
i.e.,  there  exist  L1,  L2,  and  e  such  that 

||./(x)-J(a>)  ||  <Ll  \\x-x*  ||  (44a) 

and 

IIV-7  (x )  —  V2/  (x*)  II  <L2\\x-x*  ||  (44b) 

for  x  ED  ={x:  ||x  —  x*  |j  <  e}. 

A3:  The  matrix  V2/(x*)  is  positive  definite. 

The  following  lemma  serves  as  the  foundation  of  our  convergence  result. 

LEMMA  4.1.  Suppose  that  the  sta7idard  assumptions  for  problem  (39)  hold. 
Then  there  exists  a  positive  constant  K  such  that 

I !z/#  ~S(x*)s  ||  <  Kcr(x,  x+)  \\s  ||  (45) 

where  y#  is  given  by  (43),  x,x+ED,  and  s  =x+  —  x. 

Proof.  Observe  that  by  adding  and  subtracting  the  appropriate  term  we 

have 

yti  —S(x*)s  =  J(x+)T  R(x+)  —  J(x)T  R(x+)  — S(x*)s 

=  Vf(x+)-Vf(x)-J(x)T  R(x+)-R(x)-J(x*)s 

—  |V(:z:)  —  J(x*)  J(x*)s  —  V2/  (x*)s  . 

From  (44)  and  Lemma  4.1.15  in  Dennis  and  Schnabel  [Ref.  11]  we  have 

1 1 V/  (x+)  —  V/  (z)  —  V2/  (a:*)s  II  <  L2a(x,x+ )  ||s  ||  (47a) 

and 
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|  |#(x+)  —  R  (a:)  —  J(x*)s  II  <  Lx<j{x,x+)  ||.s  ||  (47b) 

Therefore,  using  (46)  and  (47) 

\\y*  -S{x*)s  ||  <  L2(t(x,x  +  )  ||s  ||+  ||/(ar) \\L1(t{x,x+)  ||s  || 

+  ||/(x,)  || Lx  \\x-x*  ||  ||s  || 

^  Lo  +  {L  jC  +  L*)L  j  +  L*L  j  cr(:r,  2+)  ||s  || 
where  L*  =  ||/(x*)  ||.  • 

THEOREM  4.2.  Suppose  that  the  standard  assumptions  for  problem  (39)  hold. 
Then  there  exist  positive  constants  e,  8  such  that  for  a:0€lRw  and  symmetric 
AqGIR”  satisfying  ||x0  —  x*||<e  and  \  |A0  —  5  (z*)  1 1  <  8,  the  iteration 
sequence  {xk  }  generated  by  the  structured  BFGS  secant  method  for  problem  (39)  is 
q  -super linearly  convergent  to  x*. 

Proof.  The  proof  of  this  theorem  is  a  straightforward  application  of 
Theorem  3.2  and  Lemma  4.1.  • 

5.  Conclusions  and  Summary. 

In  this  paper  we  have  defined,  and  established  the  local  and  ^-superlinear 
convergence  of,  the  structured  BFGS  secant  method  for  unconstrained 
optimization.  Moreover,  we  have  introduced  the  structure  principle  as  a  tool  for 
formulating  the  appropriate  scale  in  any  structured  secant  update  for  a 
particular  problem  or  application.  Indeed,  in  his  Ph.D.  thesis  [Ref.  23],  Martinez 
defined,  and  derived  a  convergence  theory  for,  structured  secant  methods 
generated  from  the  entire  Broyden  convex  class,  using  this  structure  principle  to 
extend  the  definition  of  the  scale  from  unstructured  to  structured  applications. 
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Although  additional  work  is  needed  to  develop  a  global  convergence  theory 
for  these  structured  algorithms,  we  think  that  the  surprising  and  stronger  form 
of  bounded  deterioration  proved  here  may  be  useful  in  the  development  of  this 
global  theory,  especially  in  the  constext  of  a  trust  region  globalization  strategy  is 
used. 

Finally,  as  a  direct  application  of  the  theory  given  in  this  paper,  we  gave  the 
first  proof  of  local  and  ^-superlinear  convergence  of  the  important  structured 
BFGS  secant  method  for  the  nonlinear  least-squares  problem  which  is  used  by 
Dennis,  Gay  and  Welsh  in  the  current  version  of  the  popular  NL2SOL  code. 
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